In the period of 1991 to 2017, housing quality in New York has improved dramatically; however, some sectors of the housing stock continue to face poor conditions and some specific maintenance deficiencies continue to show higher prevalence. In this project, we develop an index that presents poor qualtity of housing in New York by measuring the physical deficiencies to show how the prevalence of these issues has shifted over time. We use data from the New York City Housing and Vacancy Survey (NYCHVS)1 and follow a similar procedure similiar to the one found in the American Housing Survey: PQI2
The index measures wighted sums of 22 variables that the authors chose. The selected variables were chosen if the authors agreed they described poor housing conditions. The index is not exhaustive, as the author’s decided to build and index that is robust with respect to time. Variables that were only collected for a small number of years were disregarded to avoid inflating values during year for which a unique variable has been. Potentially more data could be collected/used to better suit our purpose.
The authors chose not to include financial data, such as rent or income, in the index. This is largerly do the complexity of implementing such a measure. Particularly poor hosuing condition may be a good predictior of income, but the other direction is not neccesarily true. This is better visualised in (Fig 5) below.
The index below is an ordinal measurse. That is, the higher the score the more indicative of poor housing conditions. Some items in the index have been ranked by the authors accordingly. However, due to the qualitative nature of this scoring the authors chose to only rank a few variables and in other cases deafult to a score of two. Further analysis for choosing optimal weights is reccomended.
| Item | Description | NYCHVS Variable | Score |
|---|---|---|---|
| 1 | Exterior Walls: Missing brick, sliding or other | d1 | 2 |
| 2 | Exterior Walls: Sloping or bulgin walls | d2 | 2 |
| 3 | Exterior walls: Major Cracks | d3 | 2 |
| 4 | Exterior Walls: Loose or hanging corvice, roof, etc. | d4 | 2 |
| 5 | Interior Walls: Cracks or holes | 36a | 2 |
| 6 | Interior Walls: Broken plaster or peeling paint | 37a | 2 |
| 7 | Broken or missing windows | e1 | 5 |
| 8 | Rotten or loose windows | e2 | 2 |
| 9 | Boarded up windows | e3 | 3 |
| 10 | Sagging or sloping floors | g1 | 2 |
| 11 | Slanted/shifted doorsills or frames | g2 | 2 |
| 12 | Deep wear in floor causing depressions | g3 | 2 |
| 13 | Holes or missing flooring | g4 | 2 |
| 14 | Stairs: Loose, broken, or missing stair | f1 | 2 |
| 15 | Stairs: Loose, broken, or missing setps | f2 | 2 |
| 16 | No interior steps or stairways | f4 | 2 |
| 17 | No exterior steps or stairways | f5 | 2 |
| 18 | Number of heating equipment breakdowns | 32b | 2 per break down |
| 19 | Kitchen facilities fucntioning | 26c | 3 if no, 5 if no kitchen facilities |
| 20 | Toilet Breakdowns | 25c | 3 if any, 5 if no toliet or plumbing |
| 21 | Presence of mice or rats | 35a | 3 |
| 22 | Water Leakage | 38a | 3 |
Figure 1 shows the poor quality index scores for the 156,230 occupied units in the New York Housing Dataset from 1991 to 2017. The frequency distribution is skewed to the right. Overall, fourty five percent of the units were scored 0. The highest score was in 1993 with 54 points. 2008 had the highest percent (64%) of units that has 0 poor quality scores. Since we are showing a distibutuon we could have use histograms but chose to use a line graph instead to plot all years simultaneously and distinguish between them. We might have condiered animation but decided against it as the animation may have made the graphic ovewhelming. Instead the graph is interactive with a subsetable legend and tooltip.
Figure 2 shows percent the percent of occupied units with poor quality scores. Over the period of 1991 to 2017, most of the units has poor quality scores between 1 and 10 points; very little units that has the poor quality scroes over 20 points. For this data we chose bar plots over tables although either would have sufficed. The bar plots have the advantage that trends of time are more easily visualized, but the smalll bars in the final plot are hard to distinguish. A tooltip is available as a result.
Figure 3 tracks trends in poor quality index scores during the period of 1991 to 2017. We decided to report the means, medians, 75th percentiles, 95th percentiles, and 99th percentiles. In most of the years, the median had the poor quality scores of 0. The mean ranged from 4.0 in 1991 to 2.5 in 2017. The 99th percentiles clearly show the improvement of housing in New York( from 25 poor quality points in 1991 to 18 poor quality points in 2017). Line graphs were chosen to represent multiple trends over time. Since the lines are on a similiar scale they are more preferable then stacked bar plots which would require interpretting inter bar widths. The 6 trends shown are a standard percentile partition compared to the mean.
Househould income has a ceiling in the data and any incogreat greater than 10 milliion will be capped.
Household Income is slowly increasing over time, but this is expected when considering inflation. Further this is calculating percentiles over the entirety of NYC and we may be missing out on a spatial component of the data. This is smae plot stlye as in Figure 4 but with a non stanadrd percentile partition to investigate lower quantiles. The graphi is interactive.
Figure 7 shows the raw relationship between PQI and Household Income. It is meant to show how noisy the data is. Particularly noteworthy is the millionaires living in poor quality housing indexes. The plot itself has a quaite a few flaws. For instance the density of point is not clear. Almost half of the data lies on the line PQI = 0. However the plot is not meant for inference, but to point out problems.
## OGR data source with driver: GeoJSON
## Source: "C:\Users\PACMAN\OneDrive\School\By Year\Senior 18 -19\Spring 2019\Data Science\NY Housing\NYHousingDataCleaning\Community Districts.geojson", layer: "Community Districts"
## with 71 features
## It has 3 fields
Figure 8 shows a map of New York City Sub-boroughs and shades the regions by mean household income and index score. The darker shade of blue indicated lower mean income, and the darker shade of red indicated a lower hosuing quality. The plot shows what one might expect, i.e., neighborhoods with lower quality housing genrally correspond to a lower average household income. However, the value mean household income clearly does not predict housing quality indicating that it would not have been appropraite to include such value in the index without further considerations. The spatial plot gives a motivated context to the problem that scatterplots don’t, but scatter plots are employed later.
We show a quantile relationship between the 5th percentile of household income and the 95th percentile of housing quality over neighborhoods(subborough). That is, neigborhoods with very poor houses are reasonbly correlated with the same neighborhood having very low income households, but this does not necesaril mean that low income families tend to live in poor quality housing or vice versa.
This shows the strongest relationship between the variables of interest. The axes have a very high cost of interprability. There is concern that the relationship here could be easily misinterpreted.This plot is perhaps the best presented here. The plotted axes have some interpretation challenges, but the plot clearly shows the discussed trends over time while maintining a clear relation between the variables.